[DEBUG] Devel fixes#22021
Closed
ldorau wants to merge 22 commits into
Closed
Conversation
4327fef to
5e73ffe
Compare
39a3ba8 to
876a9cd
Compare
- Skip peers with disabled P2P in makeProvider (USM pool creation) - Add urUsmP2PEnablePeerAccessExp / urUsmP2PDisablePeerAccessExp - Track per-device peer status in ur_device_handle_t_::peers[] - Update existing USM pool residency on P2P enable/disable Signed-off-by: Lukasz Dorau <lukasz.dorau@intel.com>
- Fill in three placeholder multi-device tests in memory_residency.cpp - Tests verify P2P-driven residency: absent-on-peer without P2P, enable/disable state machine checks, end-to-end data transfer Signed-off-by: Lukasz Dorau <lukasz.dorau@intel.com>
Extract common logic from ext_oneapi_enable_peer_access and ext_oneapi_disable_peer_access into a templated p2pAccessHelper function to avoid code duplication. Signed-off-by: Lukasz Dorau <lukasz.dorau@intel.com>
The disablePeerAccessStateMachineAndSourceAllocationPersists test was failing intermittently because deferred frees from the preceding test complete asynchronously, causing UR_DEVICE_INFO_GLOBAL_MEM_FREE to report more free memory than the baseline captured at the start of the test. Remove the unreliable source-device free-memory assertion and the allocation it required, keeping only the state-machine checks (disable succeeds, double-disable returns UR_RESULT_ERROR_INVALID_OPERATION). The source-device allocation property is already covered by allocatingDeviceMemoryWillResultInOOM which runs first in isolation.
…oreEnablingPeerAccess Signed-off-by: Lukasz Dorau <lukasz.dorau@intel.com>
The peer table lives on peerDevice: peerDevice->peers[commandDevice->Id] tracks whether commandDevice is allowed to access peerDevice's allocations. Update urUsmP2PChangePeerAccessExp to lock peerDevice's mutex, read/write peerDevice's peer table, use peerDevice's platform for context iteration, and pass (peerDevice, commandDevice) to changeResidentDevice and validateP2PDevicePair. Also fix urUsmP2PPeerAccessGetInfoExp to query the peer table on peerDevice rather than commandDevice. Signed-off-by: Lukasz Dorau <lukasz.dorau@intel.com>
Add a P2P peer-status check in command_list_manager::appendUSMMemcpy. When both source and destination are device memory and the source resides on a different device, the adapter queries the source device's peer table to verify that access has been granted to the queue's device. Returns UR_RESULT_ERROR_INVALID_OPERATION if P2P access has not been enabled. Copies to host or shared memory are always allowed regardless of P2P state. Previously, zeCommandListAppendMemoryCopy would silently succeed for cross-device copies via the copy engine regardless of P2P state, making it impossible to test that ext_oneapi_disable_peer_access actually revokes access. Also adds negative-pair tests that verify urEnqueueUSMMemcpy fails when P2P is disabled: - enablePeerAccessStateMachineAndSourceAllocationFailsWithoutP2P - p2pReadFailsWithoutPeerAccessDisabled Signed-off-by: Lukasz Dorau <lukasz.dorau@intel.com>
Adds sycl/test-e2e/USM/P2P/p2p_usm_residency.cpp to verify that the Level Zero v2 adapter correctly handles P2P access for USM device memory between peer devices. Phase 1: Allocates memory on dev0, fills it with a known pattern, enables P2P access from dev1 to dev0, then uses dev1's queue to copy the data to the host and verifies all values match. Phase 2 (opposite direction): Allocates memory on dev1, fills it with a different pattern, enables P2P access from dev0 to dev1, then uses dev0's queue to copy the data to the host and verifies correctness. Phase 3 (negative): Enables then disables P2P access from dev1 to dev0, then attempts a memcpy via dev1's queue. The test passes if the memcpy throws an exception or if the copied data does not match the original fill pattern, confirming that ext_oneapi_disable_peer_access actually revokes access. Also adds the 'two-or-more-gpu-devices' lit feature to lit.cfg.py, set when sycl-ls reports at least two GPU devices. The test uses this feature to skip on single-GPU machines. Signed-off-by: Lukasz Dorau <lukasz.dorau@intel.com>
The source-device free-memory check in enablePeerAccessStateMachineAndSourceAllocation was failing intermittently because deferred frees from earlier tests complete asynchronously, causing UR_DEVICE_INFO_GLOBAL_MEM_FREE to report more free memory than the baseline. Remove the unreliable assertion. The test's actual value is in verifying the P2P state machine (double-enable returns error) and the end-to-end data transfer correctness. Signed-off-by: Lukasz Dorau <lukasz.dorau@intel.com>
Disable P2P access before freeing the allocation. The previous order freed the memory while P2P was still enabled, leaving a brief window where the peer device held access rights to a released allocation. The correct cleanup sequence is to revoke peer access first and then free the memory. Signed-off-by: Lukasz Dorau <lukasz.dorau@intel.com>
Replace the nested if-ladder in appendUSMMemcpy with a flat static helper function checkP2PAccess that uses early returns, as suggested in the review. The logic is identical; the refactoring makes the control flow easier to follow. Signed-off-by: Lukasz Dorau <lukasz.dorau@intel.com>
…bled test The old comment claimed that P2P must be enabled before the allocation, which is incorrect. urUsmP2PEnablePeerAccessExp calls changeResidentDevice on all contexts after updating the peer-status table, which retroactively makes already-existing allocations resident on the peer device. Enabling P2P after allocation is therefore equally valid, as demonstrated by the allocBeforeEnablingPeerAccess test. Update the comment to reflect the actual adapter behaviour. Signed-off-by: Lukasz Dorau <lukasz.dorau@intel.com>
The test exercises exactly the same code path as the existing enablePeerAccessStateMachineAndSourceAllocationFailsWithoutP2P: both allocate on devices[0] without P2P, attempt a copy via devices[1]'s queue, and assert UR_RESULT_ERROR_INVALID_OPERATION. Remove the duplicate to avoid maintaining two tests that provide no additional coverage. Signed-off-by: Lukasz Dorau <lukasz.dorau@intel.com>
The test was calling Devs[0].ext_oneapi_enable_peer_access(Devs[1]), which sets Devs[1]->peers[Devs[0].Id] = ENABLED, meaning Devs[0] can access Devs[1]'s memory. However the actual P2P copy was Queues[1].copy(arr0, arr1, N) — Devs[1]'s queue reading arr0 which lives on Devs[0] — which requires the opposite direction: Devs[0]->peers[Devs[1].Id] = ENABLED, set by Devs[1].ext_oneapi_enable_peer_access(Devs[0]). The wrong direction was harmless with the L0v1 adapter (no peer-access check on memcpy), but the L0v2 adapter's checkP2PAccess enforces the correct direction and returned UR_RESULT_ERROR_INVALID_OPERATION. Also add ext_oneapi_disable_peer_access before freeing arr0, consistent with the rule that peer access should be revoked before the guarded allocation is released. Signed-off-by: Lukasz Dorau <lukasz.dorau@intel.com>
Verify that revoking peer access prevents subsequent USM copies. The test enables P2P (devices[1] -> devices[0]), confirms that a urEnqueueUSMMemcpy succeeds, then disables P2P and asserts that the same copy returns UR_RESULT_ERROR_INVALID_OPERATION. Signed-off-by: Lukasz Dorau <lukasz.dorau@intel.com>
Verify the transition from blocked to permitted: attempt a USM copy from devices[1]'s queue without P2P (expects UR_RESULT_ERROR_INVALID_OPERATION), then enable P2P on the same allocation and retry (expects success with correct data). Signed-off-by: Lukasz Dorau <lukasz.dorau@intel.com>
Add testP2PReadFailsWithoutEnable: allocates on dev0, attempts a device-to-device memcpy via dev1's queue without ever calling ext_oneapi_enable_peer_access, and asserts a SYCL exception is thrown. This covers the case where P2P was never enabled (Phase 3 already covers the revoke case). Signed-off-by: Lukasz Dorau <lukasz.dorau@intel.com>
… P2P enable Add testP2PReadFailsThenSucceedsAfterEnable: attempts a device-to-device memcpy without P2P (expects a SYCL exception), then enables P2P on the same allocation and retries (expects success with correct data), then disables P2P and cleans up. Signed-off-by: Lukasz Dorau <lukasz.dorau@intel.com>
a64c94c to
de3ef07
Compare
When urContextCreate reuses the driver default context via zeDriverGetDefaultContext, the resulting ur_context_handle_t_ was not added to hPlatform->Contexts. urUsmP2PEnablePeerAccessExp / urUsmP2PDisablePeerAccessExp iterate Platform->Contexts to call changeResidentDevice() on every live context, so any context created through that fast path would silently miss all P2P residency updates — memory would not be made resident on newly enabled peers, and would not be evicted on disable. Fix by adding the context to hPlatform->Contexts in the fast path, the same way the regular zeContextCreate path and urContextCreateWithNativeHandle already do. Signed-off-by: Lukasz Dorau <lukasz.dorau@intel.com>
… memory residency tests The tests were using 1 MB allocations, which fall within the disjoint pool's MaxPoolableSize of 4 MB for device USM. Pooled allocations are not immediately returned to the UMF memory provider on free; they stay in the pool's free-list cache with stale per-device residency from the preceding test. This cross-test interference caused allocationNotResidentOnPeerWithoutP2P to observe a free-memory decrease on the peer device that exceeded allocSize, failing the assertion. Define a single file-scope constant kAllocSize = 5 MB (> 4 MB) and use it in all tests. Allocations above MaxPoolableSize bypass the pool's free-list cache and go directly to the UMF level_zero memory provider, where residency is set at allocation time based on the current P2P state, and released when the allocation is freed back to the OS. Signed-off-by: Lukasz Dorau <lukasz.dorau@intel.com>
When a buffer on a discrete GPU needs to be accessed from a different device and P2P access is not enabled, migrate the data through a temporary host buffer instead of returning UR_RESULT_ERROR_UNSUPPORTED_FEATURE. Before migrating, wait for pending operations (from the wait list) to complete, ensuring that prior kernel writes to the buffer are visible. Fixes: intel#22007 Fixes: intel#22008 Signed-off-by: Lukasz Dorau <lukasz.dorau@intel.com>
…essible When a discrete buffer needs to be migrated between devices via the host (because P2P access is not enabled), improve the implementation in three ways: 1. Use a USM HOST allocation for the staging buffer instead of a std::vector<char>. USM HOST memory is accessible by all devices in the context, making it more suitable as an intermediate staging area. 2. Fix event ordering: append zeCommandListAppendWaitOnEvents (when there are wait events) to order the migration relative to all in-flight work on the destination command list, then drain it with zeCommandListHostSynchronize before reading source device memory. 3. Make the host->device copy async: after the synchronous device->host copy completes (using the source device's own synchronous command list, since the destination device cannot access source device memory without P2P), enqueue only the host->device copy on the provided command list. Host memory is accessible by all devices, so this is safe. The staging buffer is stored in a new migrationStagingBuffers member of ur_discrete_buffer_handle_t and freed when the buffer is released, ensuring it remains valid for the duration of the async copy. A fully synchronous fallback is kept for the case where no command list is available (e.g. urMemGetNativeHandle). Signed-off-by: Lukasz Dorau <lukasz.dorau@intel.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
No description provided.